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Remarks 

Reconsideration of the above-caption application is respectfully requested. AU previously pending 
claims (1-23) have been rejected as being obvious over Fayyad et al. 708 in view of Tendick. 

To overcome the rejections, Claim>has been amended to recite using a distribution of the perturbed 
data to generate an estimate of a distribution of the original data, and using the estimate of the distribution 
of the original da^a to generate a data mining model. Support for this amendment can be found on page 7, 
line 21 continuing to page 8, line 1 and in Figure 4. Claim 7 has been amended to recite that the server does 
not have access to the original values, as disclosed on page 7, line 14. Claim 21 has been amended to correct 
an informality in verb tense, and Claims 14-19 have been canceled. Claims 1-13 and 15-23, of which Claims 
1, 7, and 13 are independent, remain pending. 



Under 35 U.S.C, 6103 

The claims have been rejected under 35 U.S.C. §103 as being unpatentable over Fayyad et ah '708 
in view of Tendick* with the secondary reference being used as a teaching of maintaining privacy. Fayyad 
et al. is directed only to using a perturbed value of a mean of original values, not the< ^^lnaWalu^ 
themselves, to find a starting centroid for candidate data clusters, col. 2, lines 4-5, that is "cheaper" than 
douigjjim^ operation, col.^Jin^i-6^ Fayyad et al. does not use perturbe d values of 

,$^inaT^^|«alI. Instead, once the starting centroidsW found, Fayyad et al. use actual original data 



to generate the models* 

With this understanding of Fayyad et al. in mind, attention is directed to the present claims, starting 
with Claim 1. As now amended, Claim 1 recites using a distribution of perturbed values to generate an 
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estimate of the origin distribution, and then using this estimate to generate the model. Fayyad et ai. 
to be completely silent on this previously unclaimed feature. 

Claim 7 explicitly requires racing, at a user computer, original vah.es of numeric attributes co 
render perturbed values, and then sending the perturbed values to a server computer npthaWo^ to * 
original data for processing the perturbed values to generate a model . Tte examiner has taken the position 
that the perturbation of the nuan of the centroid discussed at col. 2, lines 32-36 of Fayyad et al. reads on 
perturbing original data. He has further taken the position that me discussion of the computer system 
peripherals at col. 4,,ines 37-41 reads -f-^p^^^.^o^^p^^^ 
at a server, 

A. no. amended. Claim 7 reouires *« * ^ mt ^ ^ ,„ ^ ^ 

me*, of ^ e*«Ws comment „„, Flyyad „ „ ^ , ^ m ^ jt j5 deir ^ ^ 
processing toM^M.^to^n^^ ^ „,„ ^ ^ oyMcome 

Additionally, Appltam. n^trully disagrees „„,, a. exarnir*,'s ration, sorted above. First, 
the mean of a set of original values is not the same thing as the orison values ^ it „ ,' 

staOstt* ^^enadon „ te ^ vlte c,,^,, peiTOt>iiig a roe*, vajue as Fayyad «.,. does 
is™,hesan*.Un t , Si ^ u ^^ ta^.F^ 
et aJ. at most can .be said to peitnrb a sMstfcal representation of original ..lues. Since Fayyad «■ al. 
nowhere considers privacy, bu, orJy finding a good stardng point for cluster ceroids, the is absoluteiy 
no reason for Fayyad et al. to suggest or be modified to perturb anything omer to a statistic, a»i not 
original values. 



Received from < 6193388078 > at 4/17102 7:33:52 PM [Eastern Daylight Time] 



84/17/2002 IS: 29 619338807ff 



ROGITZ & ASSOC. 



PAGE 07 



CASE NO.: AM9-99-0226 
Serial No.: 09/487,191 
April 17, 2002 
Page 7 



PATENT 
Filed: January 19, 2000 



Second, Applicant believes that the relied-upon portion of Fayyad et al. alleged to teach generating 
perturbed values at a user computer and sending them on to a server for processing is not quite accurate. 
While Fayyad et al. contains a large volume of boilerplate about computer peripherals, and the fact that there 
are such things as computer networks (see col. 5, lines 4-30), nothing in Fayyad et al. suggests generating 
the perturbed values at a user computer, but processing them on a server computer. There is simply no 
reason to do so since Fayyad et al^not^d^tedjojn^ In any case, wherever the 

perturbed data is generated in Fayyad et al., there is no teaching or suggestion that the original data remain 
| unavailable to the computer that generates the model. If the original data were not available, the invention 
of Fayyad et al. wouldn't work, see MPEP §2143.01 (citing In re GordonV 

With respect to dependent Claim 12, the examiner alleges that Fayyad et al. col. 10, lines 18-25 
teaches perturbing categorical values of categorical attributes by selectively replacing the categorical values 
with other values based on a probability. The relied-upon section of Fayyad et al., however, nowhere 
mentions the concept of "categorical attributes". In feet, the entire patent doesn't mention the concept. 
Fayyad et al. appears to exclusively consider numerical attributes. In any case, there is no mention at all of 
replacing categorical values with other values based on a probability. 

Claim 13 requires receiving perturbed values from the user computers, with the perturbed values 
representing randomized versions of the original values, and then generating a classification model using the 
perturbed values and not using the original values. As discussed above, Fayyad et al. nowhere teaches or 
suggests not using original values in generating the cluster model. Interestingly, this last limitation of Claim 
13 was not mentioned in the rejection. 
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The Examiner is cordially invited to telephone the undersigned at (619) 338-8075 for any reason 
which would advance the instant application to allowance. 

Respectfully submitted, 



JLR:jg 



£2 



John b: Rogitz 
Registration No. 33,549 
Attorney of Record 
750 B Street, Suite3Z20 
San Diego, CA 92101 
Telephone: (619)338-8075 
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MARKED UP VERSION SHOWING CHANGES 

1. (amended) A computer-implemented method for obtaining data from at least one user 
computer via the Internet while maintaining the privacy of a user of the computer, comprising the acts of: 
perturbing original data associated with the user computer to render perturbed data; [and] 
using a distribution of the perturbed data, generating at least one estimate of a distribution 
of the original ^ ff tp; a*H 

using the estimate of the distribution o f the original data, generating at least one data mining 

model. 

7. (amended) A computer system including a program of instructions including structure 
to undertake method acts comprising: 

at a user computer, randomizing at least sbme original values of at least some numeric 
attributes to render perturbed values; 

sending the perturbed values to a server computer not having access to the original values - 
and ' 

at the server computer, processing the perturbed values to generate at least one classification 

model. 

21- (amended) The method of Claim 20, wherein the user computer u§es [used] the model 
on original data to render a classification, and then sends the classification to the Web site. 
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distance between z and w t (or between a and w,) is approximated to be the distance between the midpoints 
of the intervals in which they lie. Also, the density function f x (a) is approximated to be the average of the 
density function in the interval in which the attribute "a" lies. 
With this in mind, 

Pr'CXGIp) - (1/n) 2 (over s=l to m) of {N(y x [(fvOnfl^m^PrOCei,))] / [2(over t=l to m) 
of (f^raOJ-mfl^PrtX G I,))] , where 

I(x) is the interval in which "x H lies, m(lp) is the midpoint of the interval I,, and ffl,) is the 
average value of file density function over the interval Ip, p=l,...m. 



Using the preferred method of partitioning into intervals, the step at block 46 can be undertaken in 
OGrf) time. It is noted that a naive implementation of the last of the above equations will lead to a 
processing time of CKitf); however, because the denominator is independent of I„, the results of that 
computation are reused to achieve 0(m*> time. In the presently preferred embodiment, the number "m" of 
intervals is selected such that there are an average of 100 data points in each interval, with "ro" being bound 
10<_m_<100. 

It is next determined at decision diamond 48 whether the stopping criterion for the iterative process 
disclosed above has been met. In one preferred embodiment, the iteration is stopped when the reconstructed 
distribution is statistically the same as the original distribution as indicated by a X 2 goodness of fit test. 
However, since the true original distribution is not known, the observed randomized distribution (of the 
perturbed data) is compared with the ps compared with the] result of the current estimation for the 

10534*. AMD 



Received from < 6193388078 > at 4/1 7/02 7:33:52 PM [Eastern Daylight Time] 



04/17/2002 16:29 619338807 



ROGITZ & ASSOC. 



PAGE 11 



CASE NO.: AM9^99-0226 
Serial No.: 09/437,191 
April 17, 2002 
Page 11 



PATENT 
Filed: January 19, 2000 



reconstructed distribution, and when the two are statistically the same, the stopping criterion has been met. 
on the intuition that if these two are close, the current estimation for the reconstructed distribution is also 
close to the original distribution. 

When the test at decision diamond 48 is negative, the integration cycle counter »j • is incremented at 
block 50, and the process loops back to block 46. Otherwise, the process ends at block 52 by returning the 
reconstructed distribution. 

Now referring to Figure 5, the logic for constructing a decision tree classifier using the reconstructed 
distribution is seen. Coinrnencing at block 54. for [reach] eash attribute in the set "S» of data points, a DO 
loop is entered. Moving to block 56, split points for partitioning the data set "S" pursuant to growing the 
data tree are evaluated. Preferably, the split points tested are those between intervals, with each candidate 
split point being tested using the soiled "gini" index set forth in Classification and *«o„^ tw. 
Breiman et al., Wadswortn, Belmont, 1984. To summarize, for a data set S containing V classes (which 
can bepredefined by the user, if desired) the "gini" index is given by 1-Sp/. where Pj is the relative frequency 
of class "j" in the data set >S". For a split dividing »S» into subsets SI and S2, the index of the split is given 
by: 



index = ni /n(gini(Sl)) + n^ni^)), where n, - number of classes in Si and n 2 = 
number of classes in S2. 

The data points are associated with the intervals by sorting the values, and assigning the N(I,) lowest 
values to the first interval, the next highest values to the next interval, and so on. 
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